271 research outputs found

    COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

    Get PDF
    In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.This research has been supported by the project “Desarrollo de Técnicas Inteligentes e Interactivas de Minería de Textos” (PROMETEO/2009/119) and the project reference ACOMP/2011/001 from the Valencian Government, as well as by the Spanish Government (grant no. TIN2009-13391-C04-01)

    Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre

    Get PDF
    In recent years, Twitter has become one of the most important microblogging services of the Web 2.0. Among the possible uses it allows, it can be employed for communicating and broadcasting information in real time. The goal of this research is to analyze the task of automatic tweet generation from a text summarization perspective in the context of the journalism genre. To achieve this, different state-of-the-art summarizers are selected and employed for producing multi-lingual tweets in two languages (English and Spanish). A wide experimental framework is proposed, comprising the creation of a new corpus, the generation of the automatic tweets, and their assessment through a quantitative and a qualitative evaluation, where informativeness, indicativeness and interest are key criteria that should be ensured in the proposed context. From the results obtained, it was observed that although the original tweets were considered as model tweets with respect to their informativeness, they were not among the most interesting ones from a human viewpoint. Therefore, relying only on these tweets may not be the ideal way to communicate news through Twitter, especially if a more personalized and catchy way of reporting news wants to be performed. In contrast, we showed that recent text summarization techniques may be more appropriate, reflecting a balance between indicativeness and interest, even if their content was different from the tweets delivered by the news providers.This research work has been partially funded by the Spanish Government (Ministerio de Economía y competitividad) through the project “Técnicas de Deconstrucción en la Tecnologías del Lenguaje Humano” (TIN2012–31224), and by the Valencian Government through projects PROMETEO (PROMETEO/2009/199) and ACOMP/2011/001

    UH-MatCom at eHealth-KD Challenge 2020: Deep-Learning and Ensemble Models for Knowledge Discovery in Spanish Documents

    Get PDF
    The eHealth-KD challenge hosted at IberLEF 2020 proposes a set of resources and evaluation scenarios to encourage the development of systems for the automatic extraction of knowledge from unstructured text. This paper describes the system presented by team UH-MatCom in the challenge. Several deep-learning models are trained and ensembled to automatically extract relevant entities and relations from plain text documents. State of the art techniques such as BERT, Bi-LSTM, and CRF are applied. The use of external knowledge sources such as ConceptNet is explored. The system achieved average results in the challenge, ranking fifth across all different evaluation scenarios. The ensemble method produced a slight improvement in performance. Additional work needs to be done for the relation extraction task to successfully benefit from external knowledge sources.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

    Towards the Design of a Textile Chemical Ontology

    Get PDF
    The main goal of this paper is to present the initial version of a Textile Chemical Ontology, to be used by textile professionals with the purpose of conceptualising and representing the banned and harmful chemical substances that are forbidden in this domain. After analysing different methodologies and determining that “Methontology” is the most appropriate for the purposes, this methodology is explored and applied to the domain. In this manner, an initial set of concepts are defined, together with their hierarchy and the relationships between them. This paper shows the benefits of using the ontology through a real use case in the context of Information Retrieval. The potentiality of the proposed ontology in this preliminary evaluation encourages extending the ontology with a higher number of concepts and relationships, and validating it within other Natural Language Processing applications.This research is partially funded by the European Commission under the Seventh (FP7 - 2007- 2013) Framework Programme for Research and Technological Development through the FIRST project (FP7-287607). Moreover, it has been partially funded by the Spanish Government through the Spanish Government through the projects “Análisis de Tendencias Mediante Técnicas de Opinión Semántica” (TIN2012-38536-C03-03) and “Técnicas de Deconstrucción en las Tecnologías del Lenguaje Humano” (TIN2012-31224) and by the Generalitat Valenciana (project grant ACOMP/2013/067)

    Heritability of Batrachochytrium dendrobatidis burden and its genetic correlation with development time in a population of Common toad (Bufo spinosus)

    Get PDF
    Despite the important threat that emerging pathogens pose for the conservation of biodiversity as well as human health, very little is known about the adaptive potential of host species to withstand infections. We studied the quantitative genetic architecture responsible for the burden of the fungal pathogen Batrachochytrium dendrobatidis in a population of common toads in conjunction with other life-history traits (i.e., body size and development rate) that may be affected by common selective pressures. We found a significant heritable component that is associated with fungal burden, which may allow for local adaptation to this pathogen to proceed. In addition, the high genetic correlation found between fungal burden and development time suggests that both traits have to be taken into account in order to assess the adaptive response of host populations to this emerging pathogen.Our research was supported by the following grants: Spanish Ministry of Education reference CGL2011-23443, Ministry of Competitivity and Economy reference BES-2012-055220, and Spanish Organization of National Parks reference MARM 428/211.Peer reviewe

    The role of statistical and semantic features in single-document extractive summarization

    Get PDF
    This paper reports on the further results of the ongoing research analyzing the impact of a range of commonly used statistical and semantic features in the context of extractive text summarization. The features experimented with include word frequency, inverse sentence and term frequencies, stopwords filtering, word senses, resolved anaphora and textual entailment. The obtained results demonstrate the relative importance of each feature and the limitations of the tools available. It has been shown that the inverse sentence frequency combined with the term frequency yields almost the same results as the latter combined with stopwords filtering that in its turn proved to be a highly competitive baseline. To improve the suboptimal results of anaphora resolution, the system was extended with the second anaphora resolution module. The present paper also describes the first attempts of the internal document data representation

    Paralelismo sintáctico-semántico para el tratamiento de elementos extrapuestos en textos no restringidos

    Get PDF
    En este artículo presentamos un método basado en la teoría del paralelismo para la identificación y resolución de elementos extrapuestos en textos no restringidos. Esta teoría de paralelismo está basada en (Palomar 96) y se amplía con el desarrollo de técnicas de análisis parcial –en las que se estudia las partes relevantes del texto- que facilitan la resolución de los fenómenos lingüísticos. Nos basaremos en los programas Datalog extendidos (Dahl 94) (Dahl 95) como herramienta para la definición e implementación de gramáticas. Éstas no están basadas en reglas gramaticales sino en la detección de información relevante, relajando el proceso y ampliando el conjunto potencial de textos analizables.Este artículo ha sido subvencionado por el proyecto CICYT nº TIC97-0671-C02-01/02

    Application of Text Summarization techniques to the Geographical Information Retrieval task

    Get PDF
    Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.This work has been partially funded by the European Commission under the Seventh (FP7-2007-2013) Framework Programme for Research and Technological Development through the FIRST project (FP7-287607). It has also been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), projects TEXT-MESS 2.0 (TIN2009-13391-C04-01) and TEXT-COOL 2.0 (TIN2009-13391-C04-02) from the Spanish Government, a Grant from the Valencian Government, project "Desarrollo de Técnicas Inteligentes e Interactivas de Minería de Textos" (PROMETEO/2009/119), and a Grant No. ACOMP/2011/001

    Effect of insecticides on Trichogramma chilonis L., egg parasitoid of large cabbage moth, Crocidolomia pavonana F.

    Get PDF
    The study was carried out to examine the effects of key insecticides against Trichogramma chilonis parasitism of large cabbage moth (LCM). Three days after spraying with AttackTM, OrtheneTM and EntrustTM (permethrin + pirimiphos-methyl, acephate and spinosad), no parasitism of LCM eggs occurred. After 3 days of Bacillus thuringiensis (Bt) treatment, parasitism of LCM egg mass was 100 %, which is the same as the control. No parasitism of the egg mass occurred after spraying with ei-ther AttackTM or OrtheneTM. The percentage of parasitised LCM eggs after Bt treatment was 13.48; the control showed the highest parasitism of LCM eggs (58.13 %). The mortality of T. chilo-nis adults (in descending order) due to the insecticides after 15 hours was Entrust, Attack, Orthene and Bt. The result suggests that Bt could be included in Integrated Pest Management Programmes that depend on T. chilonis parasitism of LCM eggs and T. chilonis activity
    corecore